{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Inception V1 Example\n",
    "In this notebook we will go through the process of converting the Inception V1 model to a Neural Network Classifier CoreML model that directly predicts the class label of the input image. We will highlight the importance of setting the image preprocessing parameters correctly to get the right results. \n",
    "Lets get started!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets first download the inception V1 frozen TF graph (the .pb file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Download the model and class label package\n",
    "from __future__ import print_function\n",
    "import  os, sys\n",
    "import tarfile"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def download_file_and_unzip(url, dir_path='.'):\n",
    "    \"\"\"Download the frozen TensorFlow model and unzip it.\n",
    "    url - The URL address of the frozen file\n",
    "    dir_path - local directory\n",
    "    \"\"\"\n",
    "    if not os.path.exists(dir_path):\n",
    "        os.makedirs(dir_path)\n",
    "    k = url.rfind('/')\n",
    "    fname = url[k+1:]\n",
    "    fpath = os.path.join(dir_path, fname)\n",
    "\n",
    "    if not os.path.exists(fpath):\n",
    "        if sys.version_info[0] < 3:\n",
    "            import urllib\n",
    "            urllib.urlretrieve(url, fpath)\n",
    "        else:\n",
    "            import urllib.request\n",
    "            urllib.request.urlretrieve(url, fpath)\n",
    "\n",
    "    tar = tarfile.open(fpath)\n",
    "    tar.extractall(dir_path)\n",
    "    tar.close()\n",
    "\n",
    "inception_v1_url = 'https://storage.googleapis.com/download.tensorflow.org/models/inception_v1_2016_08_28_frozen.pb.tar.gz'\n",
    "download_file_and_unzip(inception_v1_url)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For conversion to CoreML, we need to find the input and output tensor names in the TF graph. This will also be required to run the TF graph for numerical accuracy check. Lets load the TF graph def and try to find the names"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load the TF graph definition\n",
    "import tensorflow as tf\n",
    "tf_model_path = './inception_v1_2016_08_28_frozen.pb'\n",
    "with open(tf_model_path, 'rb') as f:\n",
    "    serialized = f.read()\n",
    "tf.reset_default_graph()\n",
    "original_gdef = tf.GraphDef()\n",
    "original_gdef.ParseFromString(serialized)\n",
    "\n",
    "# Lets get some details about a few ops in the beginning and the end of the graph\n",
    "with tf.Graph().as_default() as g:\n",
    "    tf.import_graph_def(original_gdef, name='')\n",
    "    ops = g.get_operations()\n",
    "    N = len(ops)\n",
    "    for i in [0,1,2,N-3,N-2,N-1]:\n",
    "        print('\\n\\nop id {} : op type: \"{}\"'.format(str(i), ops[i].type));\n",
    "        print('input(s):'),\n",
    "        for x in ops[i].inputs:\n",
    "            print(\"name = {}, shape: {}, \".format(x.name, x.get_shape())),\n",
    "        print('\\noutput(s):'),\n",
    "        for x in ops[i].outputs:\n",
    "            print(\"name = {}, shape: {},\".format(x.name, x.get_shape())),           "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The output of the Placeholder op is the input (\"input:0\") and the output of the Softmax op towards the end of the graph is the output (\"InceptionV1/Logits/Predictions/Softmax:0\"). Lets convert to mlmodel now."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tfcoreml\n",
    "# Supply a dictionary of input tensors' name and shape (with batch axis)\n",
    "input_tensor_shapes = {\"input:0\":[1,224,224,3]} # batch size is 1\n",
    "#providing the image_input_names argument converts the input into an image for CoreML\n",
    "image_input_name = ['input:0']\n",
    "# Output CoreML model path\n",
    "coreml_model_file = './inception_v1.mlmodel'\n",
    "# The TF model's ouput tensor name\n",
    "output_tensor_names = ['InceptionV1/Logits/Predictions/Softmax:0']\n",
    "# class label file: providing this will make a \"Classifier\" CoreML model\n",
    "class_labels = 'imagenet_slim_labels.txt'\n",
    "\n",
    "# Call the converter. This may take a while\n",
    "coreml_model = tfcoreml.convert(\n",
    "        tf_model_path=tf_model_path,\n",
    "        mlmodel_path=coreml_model_file,\n",
    "        input_name_shape_dict=input_tensor_shapes,\n",
    "        output_feature_names=output_tensor_names,\n",
    "        image_input_names = image_input_name,\n",
    "        class_labels = class_labels)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets load an image for testing. We will get predictions on this image using the TF model and the corresponding mlmodel."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Now we're ready to test out the CoreML model with a real image!\n",
    "# Load an image\n",
    "import numpy as np\n",
    "import PIL\n",
    "import requests\n",
    "from io import BytesIO\n",
    "from matplotlib.pyplot import imshow\n",
    "# This is an image of a golden retriever from Wikipedia\n",
    "img_url = 'https://upload.wikimedia.org/wikipedia/commons/9/93/Golden_Retriever_Carlos_%2810581910556%29.jpg'\n",
    "response = requests.get(img_url)\n",
    "%matplotlib inline\n",
    "img = PIL.Image.open(BytesIO(response.content))\n",
    "imshow(np.asarray(img))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# for getting CoreML predictions we directly pass in the PIL image after resizing\n",
    "import coremltools\n",
    "img = img.resize([224,224], PIL.Image.ANTIALIAS)\n",
    "coreml_inputs = {'input__0': img}\n",
    "coreml_output = coreml_model.predict(coreml_inputs, useCPUOnly=True)\n",
    "coreml_pred_dict = coreml_output['InceptionV1__Logits__Predictions__Softmax__0']\n",
    "coreml_predicted_class_label = coreml_output['classLabel']\n",
    "\n",
    "#for getting TF prediction we get the numpy array of the image\n",
    "img_np = np.array(img).astype(np.float32)\n",
    "print( 'image shape:', img_np.shape)\n",
    "print( 'first few values: ', img_np.flatten()[0:4], 'max value: ', np.amax(img_np))\n",
    "img_tf = np.expand_dims(img_np, axis = 0) #now shape is [1,224,224,3] as required by TF\n",
    "\n",
    "# Evaluate TF and get the highest label \n",
    "tf_input_name = 'input:0'\n",
    "tf_output_name = 'InceptionV1/Logits/Predictions/Softmax:0'\n",
    "with tf.Session(graph = g) as sess:\n",
    "    tf_out = sess.run(tf_output_name, \n",
    "                      feed_dict={tf_input_name: img_tf})\n",
    "tf_out = tf_out.flatten()    \n",
    "idx = np.argmax(tf_out)\n",
    "label_file = 'imagenet_slim_labels.txt' \n",
    "with open(label_file) as f:\n",
    "    labels = f.readlines()\n",
    "    \n",
    "#print predictions   \n",
    "print('\\n')\n",
    "print(\"CoreML prediction class = {}, probabiltiy = {}\".format(coreml_predicted_class_label,\n",
    "                                            str(coreml_pred_dict[coreml_predicted_class_label])))  \n",
    "print(\"TF prediction class = {}, probability = {}\".format(labels[idx],\n",
    "                                            str(tf_out[idx])))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Both the predictions match, this means that the conversion was correct. However, the class label seems incorrect. What could be the reason? The answer is that we did not preprocess the image correctly before passing it to the neural network!! This is always a crucial step when using neural networks on images.\n",
    "\n",
    "How do we know what preprocessing to apply? This can be tricky to find sometimes. The approach is to find the source of the pre-trained model and check for the preprocessing that the author of the model used while training and evaluation. In this case, the TF model comes from the SLIM library so we find the preprocessing steps [here](https://github.com/tensorflow/models/blob/edb6ed22a801665946c63d650ab9a0b23d98e1b1/research/slim/preprocessing/inception_preprocessing.py#L243)\n",
    "\n",
    "We see that the image pixels have to be scaled to lie in the interval [-1,1]. Lets do that and get the TF predictions again! "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img_tf = (2.0/255.0) * img_tf - 1\n",
    "with tf.Session(graph = g) as sess:\n",
    "    tf_out = sess.run(tf_output_name, \n",
    "                      feed_dict={tf_input_name: img_tf})\n",
    "tf_out = tf_out.flatten()    \n",
    "idx = np.argmax(tf_out)\n",
    "print(\"TF prediction class = {}, probability = {}\".format(labels[idx],\n",
    "                                            str(tf_out[idx])))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Much better now! The model is predicting a dog as the highest class. \n",
    "\n",
    "What about CoreML? CoreML automatically handles the image preprocessing, when the input is of type image, so we do not have to change the input that we were passing in earlier. For the mlmodel we converted, lets see what the image biases and scale have been set to"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get image pre-processing parameters of a saved CoreML model\n",
    "from coremltools.proto import FeatureTypes_pb2 as _FeatureTypes_pb2\n",
    "\n",
    "\n",
    "spec = coremltools.models.utils.load_spec(coreml_model_file)\n",
    "if spec.WhichOneof('Type') == 'neuralNetworkClassifier':\n",
    "  nn = spec.neuralNetworkClassifier\n",
    "if spec.WhichOneof('Type') == 'neuralNetwork':\n",
    "  nn = spec.neuralNetwork  \n",
    "if spec.WhichOneof('Type') == 'neuralNetworkRegressor':\n",
    "  nn = spec.neuralNetworkRegressor\n",
    "\n",
    "preprocessing = nn.preprocessing[0].scaler\n",
    "print( 'channel scale: ', preprocessing.channelScale)\n",
    "print( 'blue bias: ', preprocessing.blueBias)\n",
    "print( 'green bias: ', preprocessing.greenBias)\n",
    "print( 'red bias: ', preprocessing.redBias)\n",
    "\n",
    "inp = spec.description.input[0]\n",
    "if inp.type.WhichOneof('Type') == 'imageType':\n",
    "  colorspace = _FeatureTypes_pb2.ImageFeatureType.ColorSpace.Name(inp.type.imageType.colorSpace)\n",
    "  print( 'colorspace: ', colorspace)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As suspected, they are not correct. Lets convert the model again and set them correctly this time. Note that the channel scale is multiplied first and then the bias is added. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Call the converter. This may take a while\n",
    "coreml_model = tfcoreml.convert(\n",
    "        tf_model_path=tf_model_path,\n",
    "        mlmodel_path=coreml_model_file,\n",
    "        input_name_shape_dict=input_tensor_shapes,\n",
    "        output_feature_names=output_tensor_names,\n",
    "        image_input_names = image_input_name,\n",
    "        class_labels = class_labels,\n",
    "        red_bias = -1,\n",
    "        green_bias = -1,\n",
    "        blue_bias = -1,\n",
    "        image_scale = 2.0/255.0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Call CoreML predict again\n",
    "coreml_output = coreml_model.predict(coreml_inputs, useCPUOnly=True)\n",
    "coreml_pred_dict = coreml_output['InceptionV1__Logits__Predictions__Softmax__0']\n",
    "coreml_predicted_class_label = coreml_output['classLabel']\n",
    "print(\"CoreML prediction class = {}, probability = {}\".format(coreml_predicted_class_label,\n",
    "                                            str(coreml_pred_dict[coreml_predicted_class_label])))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Yes, now its matching the TF output and is correct!!\n",
    "\n",
    "Note that predictions with the default CoreML predict call (when the flag useCPUOnly=True is skipped) may vary slightly since it uses a lower precision optimized path that runs faster. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
